The TigerSHARC DSP Architecture
نویسندگان
چکیده
In the past two years, several multiple data path and pipelined digital signal processors have been introduced to the marketplace. This new generation of DSPs takes advantage of higher levels of integration than were available for their predecessors. It also incorporates multiple execution units on a single core as well as deep execution pipelines. For an introduction to recent trends in DSPs see Eyer and Bier, and for comprehensive analysis on DSP chips see the DSP buyer’s guide and Levy. Here, we describe a new parallel DSP architecture called TigerSHARC. We focus on the computational aspects of its core and onchip memory architecture. To sustain the high computation rates of cores with multiple execution units, memory subsystems must scale proportionately. We based our solution to the high-bandwidth demands of this parallel DSP core on a memory architecture characterized by what we call short-vector processor techniques. These techniques are essentially smallwidth vector processor interfaces. In addition to the architectural description, we also present an application example of a finite-length impulse response, or FIR, filter. We use this example to illustrate a technique used to map this class of algorithms to a parallel, vector-oriented processor. The FIR filter is a representative member of a large class of DSP algorithms, namely any structure with delay lines such as infinite-length impulse response, or IIR, structures, equalizers, and multirate filters, all of which share similar solutions. (Two-dimensional extensions of these algorithms, such as 2D filtering and convolution used in imaging, can also be solved using extensions to the techniques presented here.) To efficiently map this class of algorithms to this parallel DSP, we must address two related problems: the distribution of computation among several execution units, and the provision of adequate alignment between data and filter coefficients. To map the delay line structure of the FIR, we apply an algorithmic transformation to the algorithm, and, as a result, expose its parallelism in a form suited to the target architecture. This algorithmic transformation produces a high efficiency implementation by relying only on aligned short-vector memory accesses. This example also shows that the conventional single-instruction, multiple-data (SIMD) dispatch mechanism, although very effective in simple linear algebra and matrix operations, may be overly restrictive when applied to this class of DSP algorithms. And, as a result, non-SIMD execution is required to achieve high efficiency. Jose Fridman Zvi Greenfield
منابع مشابه
Modern DSP Architectures
In this seminar contribution I’m going to introduce modern DSP architectures. After giving a short overview on the history of Digital Signal Processing, I will then discuss the differences between Digital Signal Processing and general purpose computing. These differences impose implications on the architecture of DSPs that I am going do discuss shortly. The main part will introduce the TigerSha...
متن کاملImproving DSP Performance with a Small Amount of Field Programmable Logic
We show a systematic methodology to create DSP + fieldprogrammable logic hybrid architectures by viewing it as a hardware/software codesign problem. This enables an embedded processor architect to evaluate the trade-offs in the increase in die area due to the field programmable logic and the resultant improvement in performance or code size. We demonstrate our methodology with the implementatio...
متن کاملNeuroMatrix® NM6403 DSP with Vector/Matrix engine
The paper describes the architecture of the NeuroMatrix® NM6403 DSP designed for image processing, signal processing and neural networks emulation [1,2]. The paper includes a brief description of the processor structure and its instruction set. The NM6403 is the first DSP based on NeuroMatrix® Core (NMC) comprises an original 32-bit VLIW RISC processor and a 64-bit SIMD Vector co-processor (VCP...
متن کاملPerformance Analysis of a Chaos-Based Multi-User Communication System Implemented in DSP Technology
This paper presents the implementation of a multi-user chaos-based communication system in DSP. The system is based on the chaotic phase shift keying (CPSK) digital modulation scheme, where chaotic signals are used as the spreading sequences of a CDMA system. Using chaotic signals offers the advantages of increased security and higher system capacity compared with conventional sequences. The ai...
متن کاملUsing Genetic Programming for Source-Level Data Assignment to Dual Memory Banks
Due to their streaming nature, memory bandwidth is critical for most digital signal processing applications. To accommodate these bandwidth requirements digital signal processors are typically equipped with dual memory banks that enable simultaneous access to two operands if the data is partitioned appropriately. Fully automated and compiler integrated approaches to data partitioning and memory...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Micro
دوره 20 شماره
صفحات -
تاریخ انتشار 2000